A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis

نویسندگان

Odétúnjí Àjàdí Odéjobí

Shun Ha Sylvia Wong

Anthony J. Beaumont

چکیده

This paper presents a novel prosody model in the context of computer text-to-speech synthesis applications for tone languages. We have demonstrated its applicability using the Standard Yorùbá (SY) language. Our approach is motivated by the theory that abstract and realised forms of various prosody dimensions should be modelled within a modular and unified framework (Coleman 1994). We have implemented this framework using the Relational Tree (R-Tree) technique. R-Tree is a sophisticated data structure for representing a multi-dimensional waveform in the form of a tree. The underlying assumption of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combine acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. To implement the intonation dimension, fuzzy logic based rules were developed using speech data from native speakers of Yorùbá. The Fuzzy Decision Tree (FDT) and the Classification And Regression Tree (CART) techniques were tested in modelling the duration dimension. For practical reasons, we have selected the FDT for implementing the duration dimension of our prosody model. To establish the effectiveness of our prosody model, we have also developed a Stem-ML prosody model for SY. We have performed both quantitative and qualitative evaluations on our implemented prosody models. The results suggest that, although the R-Tree model does not predict the numerical speech prosody data as accurately as the Stem-ML model, it produces synthetic speech prosody with better intelligibility and naturalness. The R-Tree model is particularly suitable for speech prosody modelling for languages with limited language resources and expertise, e.g. African languages. Furthermore, the R-Tree model is easy to implement, interpret and analyse.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Generating fundamental frequency contours for speech synthesis in yorùbá

We present methods for modelling and synthesising fundamental frequency (F0) contours suitable for application in textto-speech (TTS) synthesis of Yorùbá (an African tone language). These methods are discussed and compared with a baseline approach using the HMM-based speech synthesis system HTS. Evaluation is done by comparing ten-fold cross validation squared errors on a small corpus of four s...

متن کامل

Quantifying the effect of corpus size on the quality of automatic diacritization of Yorùbá texts

Yorùbá being a tone language requires tone information for the correct pronunciation of words in Text-to-Speech synthesis. Based on standard Yorùbá orthography, such information is held in tone marks, which applied to vowels and syllabic nasals as diacritical markings. However, the tone marks are not always correctly applied in many Yorùbá documents because appropriate input devices for the acc...

متن کامل

A target approximation intonation model for yorùbá TTS

A complete intonation model based on quantitative target approximation is described for Yorùbá text-to-speech (TTS) synthesis. This model is evaluated analytically and perceptually and compared to a fundamental frequency (F0) model using the standard HTS implementation. Analytical results suggest that the proposed approach more efficiently models F0 contours given typical data constraints in un...

متن کامل

Text analysis and language identification for polyglot text-to-speech synthesis

In multilingual countries, text-to-speech synthesis systems often have to deal with texts containing inclusions of multiple other languages in form of phrases, words, or even parts of words. In such multilingual cultural settings, listeners expect a high-quality text-to-speech synthesis system to read such texts in a way that the origin of the inclusions is heard, i.e., with correct language-sp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Computer Speech & Language

دوره 22 شماره

صفحات -

تاریخ انتشار 2008

A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis

نویسندگان

چکیده

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

Generating fundamental frequency contours for speech synthesis in yorùbá

Quantifying the effect of corpus size on the quality of automatic diacritization of Yorùbá texts

A target approximation intonation model for yorùbá TTS

Text analysis and language identification for polyglot text-to-speech synthesis

عنوان ژورنال:

اشتراک گذاری